Skip to content
This repository has been archived by the owner on Oct 15, 2020. It is now read-only.

Precompute fixed-length string dtypes #20

Merged
merged 3 commits into from
Aug 24, 2020

Conversation

eric-czech
Copy link
Collaborator

#12

This also updates the test data to contain alleles and sample id strings that are longer to ensure that inferred data types are correct.

The function _max_str_len should probably be a central utility, but we can address that in https://github.com/pystatgen/sgkit/issues/90.

@eric-czech eric-czech requested a review from tomwhite August 5, 2020 12:21
Copy link
Collaborator

@tomwhite tomwhite left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good - just a small request to doc how test data was generated.

1 1:7:A:C 0.0 7 C A
1 1:8:A:C 0.0 8 C A
1 1:9:A:C 0.0 9 C A
1 1:1:G:CGCGCG 0.0 1 CGCGCG G
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you document how you generated this file?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm how about this:

  • I'll add some notes for now saying this was created using software in a separate environment (hail)
  • Create an issue to document this properly and include the associated code that created it
  • Wait to see where the multi-repo conversation goes (https://github.com/pystatgen/sgkit/issues/65#issuecomment-670049733)
  • Add this to the validation folder if we merge repos since it is essentially the same process as the one I used in the REGENIE and HWE PRs

That sound good? I'd rather not bootstrap another validation-like concept with all the associated CI changes if I can help it.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tomwhite does @eric-czech's proposal sound good?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes (sorry forgot to reply) - sounds great!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#29

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add some notes for now saying this was created using software in a separate environment (hail)

467d4bd#diff-33e9afa89862f603b25f9c5abf5ef334R6

@eric-czech
Copy link
Collaborator Author

@tomwhite can I get a sign off on those updates when you get a chance?

@eric-czech eric-czech merged commit f62fc70 into sgkit-dev:master Aug 24, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants